258 research outputs found

    Amodal Segmentation Through Out-of-Task and Out-of-Distribution Generalization With a {B}ayesian Model

    Get PDF

    VoGE: A Differentiable Volume Renderer using Gaussian Ellipsoids for Analysis-by-Synthesis

    Get PDF
    Differentiable rendering allows the application of computer graphics onvision tasks, e.g. object pose and shape fitting, via analysis-by-synthesis,where gradients at occluded regions are important when inverting the renderingprocess. To obtain those gradients, state-of-the-art (SoTA) differentiablerenderers use rasterization to collect a set of nearest components for eachpixel and aggregate them based on the viewing distance. In this paper, wepropose VoGE, which uses ray tracing to capture nearest components with theirvolume density distributions on the rays and aggregates via integral of thevolume densities based on Gaussian ellipsoids, which brings more efficient andstable gradients. To efficiently render via VoGE, we propose an approximateclose-form solution for the volume density aggregation and a coarse-to-finerendering strategy. Finally, we provide a CUDA implementation of VoGE, whichgives a competitive rendering speed in comparison to PyTorch3D. Quantitativeand qualitative experiment results show VoGE outperforms SoTA counterparts whenapplied to various vision tasks,e.g., object pose estimation, shape/texturefitting, and occlusion reasoning. The VoGE library and demos are available athttps://github.com/Angtian/VoGE.<br

    Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning

    Get PDF
    Visual Question Answering (VQA) models often perform poorly onout-of-distribution data and struggle on domain generalization. Due to themulti-modal nature of this task, multiple factors of variation are intertwined,making generalization difficult to analyze. This motivates us to introduce avirtual benchmark, Super-CLEVR, where different factors in VQA domain shiftscan be isolated in order that their effects can be studied independently. Fourfactors are considered: visual complexity, question redundancy, conceptdistribution and concept compositionality. With controllably generated data,Super-CLEVR enables us to test VQA methods in situations where the test datadiffers from the training data along each of these axes. We study four existingmethods, including two neural symbolic methods NSCL and NSVQA, and twonon-symbolic methods FiLM and mDETR; and our proposed method, probabilisticNSVQA (P-NSVQA), which extends NSVQA with uncertainty reasoning. P-NSVQAoutperforms other methods on three of the four domain shift factors. Ourresults suggest that disentangling reasoning and perception, combined withprobabilistic uncertainty, form a strong VQA model that is more robust todomain shifts. The dataset and code are released athttps://github.com/Lizw14/Super-CLEVR.<br

    {HandFlow}: {Q}uantifying View-Dependent {3D} Ambiguity in Two-Hand Reconstruction with Normalizing Flow

    Get PDF
    Reconstructing two-hand interactions from a single image is a challengingproblem due to ambiguities that stem from projective geometry and heavyocclusions. Existing methods are designed to estimate only a single pose,despite the fact that there exist other valid reconstructions that fit theimage evidence equally well. In this paper we propose to address this issue byexplicitly modeling the distribution of plausible reconstructions in aconditional normalizing flow framework. This allows us to directly supervisethe posterior distribution through a novel determinant magnituderegularization, which is key to varied 3D hand pose samples that project wellinto the input image. We also demonstrate that metrics commonly used to assessreconstruction quality are insufficient to evaluate pose predictions under suchsevere ambiguity. To address this, we release the first dataset with multipleplausible annotations per image called MultiHands. The additional annotationsenable us to evaluate the estimated distribution using the maximum meandiscrepancy metric. Through this, we demonstrate the quality of ourprobabilistic reconstruction and show that explicit ambiguity modeling isbetter-suited for this challenging problem.<br

    State of the Art in Dense Monocular Non-Rigid 3D Reconstruction

    Get PDF
    3D reconstruction of deformable (or non-rigid) scenes from a set of monocular2D image observations is a long-standing and actively researched area ofcomputer vision and graphics. It is an ill-posed inverse problem,since--without additional prior assumptions--it permits infinitely manysolutions leading to accurate projection to the input 2D images. Non-rigidreconstruction is a foundational building block for downstream applicationslike robotics, AR/VR, or visual content creation. The key advantage of usingmonocular cameras is their omnipresence and availability to the end users aswell as their ease of use compared to more sophisticated camera set-ups such asstereo or multi-view systems. This survey focuses on state-of-the-art methodsfor dense non-rigid 3D reconstruction of various deformable objects andcomposite scenes from monocular videos or sets of monocular views. It reviewsthe fundamentals of 3D reconstruction and deformation modeling from 2D imageobservations. We then start from general methods--that handle arbitrary scenesand make only a few prior assumptions--and proceed towards techniques makingstronger assumptions about the observed objects and types of deformations (e.g.human faces, bodies, hands, and animals). A significant part of this STAR isalso devoted to classification and a high-level comparison of the methods, aswell as an overview of the datasets for training and evaluation of thediscussed techniques. We conclude by discussing open challenges in the fieldand the social aspects associated with the usage of the reviewed methods.<br

    {3D} Morphable Face Models -- Past, Present and Future

    No full text
    In this paper, we provide a detailed survey of 3D Morphable Face Models over the 20 years since they were first proposed. The challenges in building and applying these models, namely capture, modeling, image formation, and image analysis, are still active research topics, and we review the state-of-the-art in each of these areas. We also look ahead, identifying unsolved challenges, proposing directions for future research and highlighting the broad range of current and future applications

    Cutting-edge advances in modeling the blood–brain barrier and tools for its reversible permeabilization for enhanced drug delivery into the brain

    Get PDF
    The bloodâ brain barrier (BBB) is a sophisticated structure whose full functionality is required for maintaining the executive functions of the central nervous system (CNS). Tight control of transport across the barrier means that most drugs, particularly large size, which includes powerful biologicals, cannot reach their targets in the brain. Notwithstanding the remarkable advances in characterizing the cellular nature of the BBB and consequences of BBB dysfunction in pathology (brain metastasis, neurological diseases), it remains challenging to deliver drugs to the CNS. Herein, we outline the basic architecture and key molecular constituents of the BBB. In addition, we review the current status of approaches that are being explored to temporarily open the BBB in order to allow accumulation of therapeutics in the CNS. Undoubtedly, the major concern in field is whether it is possible to open the BBB in a meaningful way without causing negative consequences. In this context, we have also listed few other important key considerations that can improve our understanding about the dynamics of the BBB.The authors, DCF, RLR and JMO, would like to thank the funds under the project 2IQBIONEURO (reference: 0624_2IQBIONEURO_6_E) co-funded by INTERREG (Atlantic (Atlantic program or 622 V-A Spain-Portugal) and European fund for Regional Development (FEDER).Open Access funding enabled and organized by Projekt DEAL

    An algorithm to compare two‐dimensional footwear outsole images using maximum cliques and speeded‐up robust feature

    Get PDF
    Footwear examiners are tasked with comparing an outsole impression (Q) left at a crime scene with an impression (K) from a database or from the suspect\u27s shoe. We propose a method for comparing two shoe outsole impressions that relies on robust features (speeded‐up robust feature; SURF) on each impression and aligns them using a maximum clique (MC). After alignment, an algorithm we denote MC‐COMP is used to extract additional features that are then combined into a univariate similarity score using a random forest (RF). We use a database of shoe outsole impressions that includes images from two models of athletic shoes that were purchased new and then worn by study participants for about 6 months. The shoes share class characteristics such as outsole pattern and size, and thus the comparison is challenging. We find that the RF implemented on SURF outperforms other methods recently proposed in the literature in terms of classification precision. In more realistic scenarios where crime scene impressions may be degraded and smudged, the algorithm we propose—denoted MC‐COMP‐SURF—shows the best classification performance by detecting unique features better than other methods. The algorithm can be implemented with the R‐package shoeprintr
    corecore